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ABSTRACT 

i  * 

This  paper  is  an  introduction  to  some  of  the  ideas  of  robust  ^ 

statistical  methods  as  was  presented  to  the  Fourth  International  H 

Congress  for  Mathematical  Education,  session  on  Exploratory  Data  1 

j  f 

Analysis.  ; 

Most  statistical  methods  taught  and  used  today  are  very 
sensitive  to  bad  or  atypical  data  and  can  give  meaningless  |j 

results  in  their  presence.  Robust  methods  protect  against  these 
undesirable  effects  and  can  be  Incorporated  into  the  teaching  of 
statistics  at  all  levels  of  complexity.  We  discuss  the  need  for 
robust  methods  to  supplement  (not  replace)  standard  procedures, 
suggest  some  considerations  regarding  teaching,  and  review  some 
of  the  fundamental  concepts  of  robust  estimation. 
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Robust  Methods  for  EDA 

Robust  methods,  which  protect  against  undesirable  effects 
of  unusual  observations  In  the  analysis  of  data,  can  easily 
be  Incorporated  Into  the  teaching  of  statistics  at  all  levels. 
Because  many  of  the  basic  concepts  are  simple,  robustness 
can  and  should  be  discussed  when  the  student  Is  being  Introduced 
to  statistical  Ideas. 

Robustness  should  complement,  not  replace,  standard 
statistical  tools  such  as  means,  variances,  least  squares 
estimates,  and  other  methods  based  on  assumptions  such  as 
the  normal  distribution.  In  fact,  many  statisticians  now 
recommend  that  a  robust  analysis  be  used  routinely  to  help 
assess  the  validity  of  a  more  classical  analysis,  because 
hidden  structure  or  problems  with  the  data  are  often  brought 

9 

to  light.  If  the  classical  and  robust  analyses  approximately 
agree,  this  can  be  taken  as  a  confirmation  of  the  classical 
results  by  a  secondary  analysis.  But  when  they  disagree,  there 
Is  work  to  be  done  because  either  errors  In  the  data  need  to 
be  corrected,  or  else  unexpected  structure  remains  to  be  dis¬ 
covered  and  explained. 

The  need  for  statistical  robustness  can  be  seen  even  In  the 
basic  problem  of  finding  an  "average"  value  to  summarize  a 
list  of  numbers.  For  example,  to  summarize  the  five  numbers 

7,  8,  6,  4,  100  , 
the  arithmetic  mean  Is 

7+8+6+4+100 

Mean  *  5  *  25 

which  Is  not  a  typical  value!  For  some  real-life  problems, 

25  would  be  the  proper  summary;  but  It  Is  often  better  to 
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summarize  the  reasonable  portion  of  the  data  (7,8,6,  and  4)  and  to 
study  exceptional  values  (like  100)  separately,  for  example, 
to  decide  If  they  are  interesting  special  cases  for  further 
study  or  simply  In  error. 

The  median  Is  a  robust  measure  of  average  which  has  half 
of  the  numbers  smaller  and  half  larger  than  itself.  For  this 
data  set,  it  is 

Median  *  Middle  value  of  (4,6,7,8,100)  s  7  , 
which  we  see  Is  a  typical  value. 
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Robustness,  formally.  Is  protection  against  unusual  data 
and  violated  assumptions.  A  few  atypical  or  "bad"  observations 
can  ruin  an  ordinary  analysis,  but  will  have  only  a  very  limited 
effect  on  a  robust  analysis.  Using  robust  methods  is 
analogous  to  taking  out  an  Insurance  policy  for  protection 
against  the  presence  of  bad  data:  the  Insurance  premium  is  paid 
as  an  Increase  In  sampling  variation  or  efficiency  of  the  estimate. 
In  real  data,  errors  are  often  present,  and  this  "Insurance"  can 
be  vital.  Robust  methods  also  help  in  the  detection  of  outliers 
(atypical  data),  which  can  be  very  useful  In  error  detection. 

The  teaching  of  robustness  can  proceed  at  many  levels: 
simple  or  complex,  pencil  or  computer,  In-class  or  Independent 
project.  It  can  be  taught  separately  as  a  section  by  Itself, 
but  is  also  easily  Integrated  with  other  statistical  topics. 


For  example,  after  teaching  a  new  standard  procedure,  some  time 
can  be  spent  discussing  methods  of  "robustifying"  that  method. 
The  use  of  examples  is  crucial,  of  course,  to  teaching  any 
statistical  ideas  and  maintaining  student  interest;  pictures 
and  graphic  displays  should  be  used  frequently. 

To  illustrate  some  robust  methods  for  location  (average) 
estimation,  consider  the  attention  spans  of  10  hypothetical 
students : 

5,18,15,2,8,55,11,3,9,8  minutes 

The  arithmetic  mean  (not  robust)  is 

5+18+ . . .+8 

mean  “  io  *  13.4  minutes 

* 

The  lOt  trimmed  mean  (robust)  is  formed  by 
(1)  ordering  the  data  from  smallest  to  largest,  (2)  trimming 
(removing)  10%  of  the  data  from  each  side,  and  (3)  taking  the 
arithmetic  mean  of  what  remains. 

1)  order:  2,3,5,8,8,9,11,15,18,55 

2)  trim  2  and  55 

3)  10%  timed  near  »  --t1?-  .  g.6  minutes 

The  median  (very  robust  against  atypical  values)  Is 

8+9 

median  «  ^  *  8.5  minutes. 

These  estimators  (mean,  trimmed  mean,  median)  are  all  examples 
of  a  rich  family  of  location  estimators  called  L-estimates , 
which  are  linear  combinations  of  order  statistics. 

Another  useful  class  that  also  Includes  robust  members  is 
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the  M-estlmates  which  generalize  least  squares  and  maximum 
likelihood  procedures.  These  Include  the  arithmetic  mean, 
which  minimizes  the  sum  of  squared  deviations 
n  - 

I  (x,-er  . 

1-1  1 

In  place  of  squaring,  M-estlmates  allow  a  function  p  that 
can  be  less  sensitive  to  outliers.  We  minimize 
n 

Z  p( x . -0 ) 

1-1  1 

by  differentiating  and  solving 
n 

I  ip ( x  - -0 )  -  0 

1*1  1 

where  \f>  -  (constant)  •  (dp/dd).  Different  choices  of  p  lead 
to  different  M-estlmates  with  different  properties.  Some 
examples  are  given  In  Figure  1. 

The  median,  an  M-estlmate  with  p(x)  *  |x|  ,  is  extremely 
resistant  to  bad  data  but  suffers  from  "granularity",  a  lack 
of  responsiveness  to  data  near  the  central  value.  The  Huber 
choice  for  p  corrects  this  problem:  near  zero.  It  Is  like  the 
mean,  allowing  data  near  the  average  to  "fine-tune"  the  estimate, 
while  maintaining  resistance  to  bad  data  by  behaving  like  the 
median  away  from  the  middle.  Tukey's  Slsquare  also  combines 
efficiency  and  robustness,  but  has  a  ip  that  Is  "redescends"  to 
zero;  In  effect,  this  says  that  data  that  are  very  far  from  the 
middle  will  not  be  believed,  and  will  have  zero  effect  on  the 
estimate. 

For  easy  pencil -and-paper  calculation,  L-estlmates  are 
preferable,  because  the  minimization  step  for  M-estlmates  (other 


than  the  mean  and  median)  Is  best  attempted  with  a  pocket 
calculator  or  computer. 


The  proportion  of  bad  data  that  an  estimation  procedure 
can  tolerate  and  still  return  a  sensible  answer  is  its  Break¬ 
down  Value.  The  mean  has  a  breakdown  value  of  zero,  because 
by  changing  the  value  of  even  a  single  number,  the  mean  can 
be  forced  to  assume  any  value  as  In  Figure  2a.  The  median 
has  a  breakdown  value  of  50%  because  almost  half  of  the  data 
must  be  changed  before  the  median  breaks  down  completely, 
as  illustrated  in  Figure  2b.  Note  that  extreme  observations  do 
have  an  effect  upon  the  median  (compare  the  second  and  third 
parts  of  Figure  2b).  Also  note  that  when  3  of  5  points  (more 
than  50%)  are  moved,  the  median  breaks  down  as  shown, in 
Figure  2b.  Breakdown  Values  of  trimmed  means  lie  in  between 
those  of  the  mean  and  median;for  example,  the  10%  trimmed 
mean  has  a  Breakdown  Value  of  10%. 


FIGURE  2a.  The  mean  has  0%  breakdown  value. 
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FIGURE  2b.  The  median  has  50%  breakdown  value 
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One  measure  of  robustness  of  an  estimate  Is  provided  by 


t 

( 

£ 
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measuring  the  effect  of  adding  a  new  point  x  to  a  sample 

A 

x, . x_.  The  Influence  Function  of  the  estimate  0  at  the 

i  n  _____  - 

value  x  is  defined  to  be 

A  1  A  /V  I 

l+(x,0)  *  (n+1 )  <e(xlj...,xf|,x)-0(x1,...  ,xn)> 

/V 

For  example,  if  0  is  the  mean  (Ex^)/n,  we  can  calculate 
I+(x,7)  *  x-x  . 

Plotting  I+, 


! 

1 


1  i 

4 

•  « 

A 
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we  see  that  the  mean  has  an  unbounded  Influence  Function,  and 
is  therefore  not  robust  because  there  is  no  limit  to  the  effect 
a  single  new  point  can  have  on  the  mean.  For  M-estimates,  I+  is 

very  much  like  ip. 

Several  alternatives  exist  for  estimating  scale  that 
robustify  the  standard  deviation: 


The  "MAD"  (Mean  Absolute  Deviation  has  the  median)  is  obtained 
by  replacing  means  by  medians: 

MAD  *  Mediant Ixj-ml ,  |x2-m| |xn-m| ) 
where  m  *  Median  (x^,...,xn) 

For  example,  an  initial  data  set  7,8,6,4,100  has  a  SD  =  but 
MAD  *  Median  ( |  7-7 | , |8-7| ,..., 1 100-7 | )  «  Median  (0,1,1,3,93)  =  1. 
The  large  standard  deviation  42,  is  due  to  the  fact  that  100  is 
very  far  from  most  of  the  data  set.  The  MAD,  1,  is  smaller 
because  this  single  large  contribution  does  not  dominate. 

Another  robust  scale  estimate  is  the  Interquartile  Range, 
simply  the  upper  quartile  minus  the  lower  quartile  (after  ordering 
the  data,  quartiles  are  1/4  of  the  way  in  from  each  end). 

Linear  regression,  fitting  a  straight  line  to  points  in  two 
dimensions,  can  also  be  robustified,  for  example  with  M-estimation 
techniques.  However,  even  M  estimates  can  break  down  In  a 
situation  as  in  Figure  3.  Which  line  do  we  want?  The  answer  is 
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"both  lines".  If  the  high-leverage  point  is  in  error,  we  prefer 
a  robust  line,  such  as  the  repeated  median  line  (Launer  and 
Siegel  1981).  But  the  least  squares  line  is  preferable  when  that 
outlying  point  is  correct,  because  in  this  case  that  single  point 
provides  nearly  all  our  information  about  the  slope! 

A  final  example  of  the  usefulness  of  robust  methods  is  the 
fitting  of  two  related  shapes.  Consider  a  square  with  one  point 
distorted  (dotted  shape)  fitted  to  a  perfect  square  (solid 
shape)  by  allowing  rotation,  translation,  and  magnification. 
Figure  4  indicates  the  least  squares  fit,  and  the  robust  fit  by 
Repeated  Medians.  Because  the  robust  method  "fits  what  fits" 
it  indicates  clearly  that  the  dotted  shape  is  identical  to  a 
square  except  at  one  point.  The  least  squares  fit,  by  compro¬ 
mising  and  trying  to  fit  too  much,  makes  this  sort  of  inference 
much  more  difficult.  Practical  application  of  this  type  of  shape 
fitting  has  been  demonstrated  by  Siegel  and  Benson  (1980)  in  the 
comparison  of  fossil  shapes  and  of  human  skulls. 


FIGURE 


Robust  methods  are  also  available  for  correlation,  time 
series,  and  two-way  analysis  in  addition  to  the  location, 
scale,  and  regression  problems  discussed  here.  For  more  in¬ 
formation,  we  refer  you  to  the  reference  list  that  follows. 
Remember  that  robustness  is  a  young  field  (although  its  roots 
are  deep  In  the  past)  and  we  can  expect  more  books  to  become 
available  in  the  near  future. 
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